Image Retrieval as Linguistic and Nonlinguistic Visual Model Matching
نویسنده
چکیده
THISARTICLE REVIEWS RESEARCH ON HOW people use mental models of images in an information retrieval environment. An understanding of these cognitive processes can aid a researcher in designing new systems and help librarians select systems that best serve their patrons. There are traditionally two main approaches to image indexing: concept-based and content-based (Rasmussen, 1997). The concept-based approach is used in many production library systems, while the content-based approach is dominant in research and in some newer systems. In the past, contentbased indexing supported the identification of “low-level” features in an image. These features frequently do not require verbal labels. In many cases, current computer technology can create these indexes. Conceptbased indexing, on the other hand, is a primarily verbal and abstract identification of “high-level” concepts in an image. This type of indexing requires the recognition of meaning and is primarily performed by humans. Most production-level library systems rely on concept-based indexing using keywords. Manual keyword indexing is, however, expensive and introduces problems with consistency. Recent advances have made some content-based indexing practical. In addition, some researchers are working on machine vision and pattern recognition techniques that blur the line between concept-based and content-based indexing. It is now possible to produce computer systems that allow users to search simultaneously on aspects of both concept-based and content-based indexes. The intelliP. Bryan Heidorn, Graduate School of Library and Information Science, University of Illinois, 501 E. Daniel, Champaign, IL 61820 LIBRARY TRENDS, Vol. 48,No. 2, Fall 1999, pp. 303-325 01999 The Board of Trustees, University of Illinois 304 LIBRARY TRENDS/FALL 1999 gent application of this technology requires an understanding of the user’s visual mental models of images and cognitive behavior. INTRODUCTION To better understand the relationship between concept-based and content-based indexing in a volume such as this, it is useful to refocus and re-evaluate image indexing. An understanding of these techniques may be unified by examining how each relates to “visual mental models.” From this perspective, image retrieval system work is an endeavor to create a concordance between an abstract indexing model of visual information and a person’s mental model of the same information. All visual information retrieval research, from the computational complexity of edge detectors to national standards for museum indexing of graphical material, is an attempt to bring the indexing model and the user’s mental model into line. All index abstraction, nonlinguistic or linguistic, may be classified by their success in matching the user’s abilities. Borgman (1986) emphasizes that retrieval systems should be designed around “nafural” human thinking processes. Index facet effectiveness is more dependent on the facets’ harmonization of the facets with human cognition than on whether it is linguistic (concept-based) or nonlinguistic (content-based) . In describing the content of images in the realm of art, Panofsky (1955) distinguishes between pre-iconography, iconography, and iconology. Preiconographic content refers to the nonsymbolic or factual subject matter of an image. It includes the generic actions, entities, and entity attributes in an image. As an example, a pre-iconographic index may indicate that an image contains a stone (attribute), bridge (entity), and a river (entity). Iconographic content identifies individual or specific entities or actions. In the example, the bridge might be identified as the “Palmer Bridge” and the “Hudson River.” The iconologic index would include the symbolic meaning of an image. The image might be indexed as “peaceful” or symbolizing “simpler times.” The indexing that is appropriate depends on the type of subject matter that the searchers will eventually have in mind when they are doing a search. This type of subject classification can be used to explain the strengths and weaknesses of content-based and concept-based indexing. Computers frequently perform content-based indexing. Computers can cost-effectively identify image attributes such as color, texture, and layout. Historically, limitations in computer algorithms have limited computer indexing to just a fraction of the pre-iconographics content. This, however, is changing, and the challenge for researchers and developers is to expand the functionality of the systems. Within limited contexts, computer indexing has been able to move into iconographic subject matter. For example, by exploiting information in picture captions in newspapers, a system may identify individuals in an image (Srihari, 1995). Other sysHEIDORN/IMAGE RETRIEVAL 305 tems can identify and index objects such as trees or horses using low-level features such as texture and symmetry (Forsyth et al., 1996). Linguistic content-based indexing has traditionally been performed by humans. While it is expensive and time consuming, it is possible to create indexes for all three types of content matter described by Panofsky. Hastings (1995) demonstrated that, in some retrieval situations, searchers use a combination of both visual and verbal features. With current technology, this means the use of both content-based and concept-based techniques. This article will focus on pre-iconographic indexing since this is the main area where content-based and concept-based techniques overlap. Content-based techniques may be used effectively where the computer can extract and synthesize features, attributes, and entities in images that are consistent with human understanding of the images. The computer must model the image in a way that is isomorphic (but not identical) to the human model of the image. Human indexers and searchers must also shape representations or mental models of the images if the indexer is to produce a functional index. In order to demonstrate the importance and pervasiveness of this process, this article will explore two aspects of indexing: color and object naming (shape). The first section will discuss the cognitive and social processes that give rise to the visual mental models that are shared by indexers and searchers. The next section explains what is meant by mental models in this context. Following this is a discussion of the representation of objects and shapes in visual mental models and then how both content-based and concept-based indexes capture (or neglect) aspects of these models. This is followed by a discussion of color in mental models and then discussion of the approaches to concept-based and content-based indexing by color. IMAGE ACCESS PROCESS AS A SOCIOCOGNITIVE Imagine an image of a bridge at sunset on a winter day. What color is the sky? Is there a name for the color? What objects are in the image? Are they important? Is the sun visible or has it already descended below the horizon? If you wanted to store this image with 100,000 others, how would you find it again? How would you describe it so that someone else could find it? Would words be enough? The answer to all of these questions depends on personal history and cultural expectations. The act of indexing and accessing images from a database is a sociocognitive process grounded in both biology and experience. The term “sociocognitive” here means a combination of the social aspects of cognition as well as the individual aspects of mental life. Cognition refers to all processes involved in the perception, transformation, storage, retrieval, manipulation, and use of information by people. Of particular interest here will be those aspects of cognition that are called mental models. In a social context, we often wish to communicate our thoughts to 306 LIBRARY TRENDS/FAIL 1999 others. We frequently do this with language but also through our postures, gestures, or hand drawn illustrations or, for the gifted, through works of art. Communication between people is an act of one person referencing and changing the representations used in the cognition of another person, what they are thinking about, and even how they are thinking. In this context, indexing is a form of communication between the indexer and the people who will search for images in a collection. The indexer must rely on both shared cognitive heritage and social conventions to represent salient aspects of an image in the indexing scheme. The searchers, in using the index, must express their interests in the same language that was used by the indexers. In the first paragraph of this section of the article, you were asked, through natural language, to create a “visual mental model” or “image” in your mind. Each reader’s image is different, but certainly there are aspects of the image that are shared among readers. Some of these aspects may be based on the shared biolo<g of our vision systenis (most of us can imagine color), and some shared aspects may be attributable to our shared experience. M’e all know what bridges are without having been born with that knowledge. Some aspects of the visual mental model are easily described with natural language or verbal tags. Other aspects seem to defy simple linguistic description. “Although grammars provide devices for conveying rough topological information such as connectivity, contact, and containment, and coarse metric contrasts such as near/far or flat/globular, they are of very little help in conveying precise Euclidean relations: a picture is worth a thousand words” (Pinker & Bloom, 1995, p. 715). This linguistic versus nonlinguistic contrast parallels concept-based and content-based indexing techniques. Understanding these mental models of images and how we can communicate information about them can enlighten us regarding content-based and concept-based indexing. Shera (1965) identified prerequisites for constructing a framework for indexing (an indexing vocabulary). These include an understanding of language and the communication process as well as an understanding of the relationship between human thought and mechanisms for recording thoughts such as language (p. 56). Indexers and system designers need to understand human cognition and communication in order to produce good indexes. The shared cognitive abilities and shared experience serve as the basis for this communication. These shared attributes may also arise from general world experience as in the earlier sunset example. Other attributes may arise from specialized training such as when an architect uses the Art and Architecture Thesaurus (Barnett & Petersen, 1989) to access a cultural heritage image collection or when a botanist uses the language in an identification key to label a specimen. In both cases, these cognitive attributes are learned in a social context. HEIDORN/IMAGE RETRIEVAL 307 In this discussion, the term “sociocognitive” is intended in its broadest sense. The social context here includes the conventions that allow indexers and searchers to learn common terminology, the natural and synthetic ontologies for image description. It is these aspects of the social environment that exist in a deep interplay with the shared cognitive abilities, biases, and frailties of the image access community. Cognitive abilities include not only a “higher” cognitive process but also the perceptual experience that is often the object of the “higher” cognitive processes. In this article, we do not focus on the social processes that indexers participate in to create indexing standards, although this is certainly important. The focus here is on the social environment that gives rise to the indexer’s thoughts about images. Jacob and Shaw (1999) introduce a sociocognitive perspective on representation. From their perspective and the perspective of this article, language and communication influence the organiration of knowledge at both the individual and social level. Social processes lead to the creation of a shared vocabulary to describe a field. However, the Jacob and Shaw treatment is primarily limited to linguistic constructs: “[R]epresentation is primarily linguistic, the development of truly effective systems of retrieval must include a thorough appreciation of how language is used in the social processes of communicating knowledge” (p. 131). When describing images, however, content-based indexing techniques introduce nonlinguistic forms of indexing (and communication), so this sociocognitive perspective must be extended to include nonlinguistic processes (such as color and texture maps). For images, it is clear that descriptions are grounded first in the perceptual abilities of indexers and searchers. This does not diminish the critical role of natural language in image description. The creation of a vocabulary to describe images is a Darwinian adaptation and is universal to the species. This language learning is a sociocognitive process. For example, the perception of color is physical, but the color names are arrived at through a social process. There are millions of colors that people can distinguish (Bruner, Goodnow, & Austin, 1956, p. 1)but only some are named. An information retrieval system designer must decide if a collection should be indexed using the unlabeled colors (i.e., color histograms) or using labeled category names such as “red,” “green,” or “blue.” The designer may choose to use both nonlinguistic and linguistic approaches. The decision must be made on both sociocognitive and technical grounds. In the mental image of a bridge at sunset, it might be reasonable to apply the label “red” for the sky. However, the colors in an actual sunset, or in our mental images, may defy our language skills. Figure 1, “Sunset, Palmer Bridge, New York’ is a digital image from the American Memory Collection at the Library of Congress (Detroit Photographic Co., ~ 1 9 0 0 ) . ~ In this image, the sky’s color does 308 LIBRARY TRENDS/FALL 1999 not have a name with which many people would agree. The designer must decide if the users have a word to describe the particular shade of a sunset that is needed to complement the color of a car in an automobile advertisement. Nonlinguistic, content-based color retrieval is provided in current commercial and research image database systems such as Virage (Gupta et al., 1997), VisualSEEK (Smith & Chang, 1996a), QBIC (Niblack et al., 1992;Flickner et al., 1995), and Photobook (Pentland, 1993). These include, among others, color swaths, color mixing interfaces, perceptually significant coefficients, and color similarity matching as discussed in the section on models of color. Ficure 1. “Sunset, Palmer Bridge. New York.” MENTAL MODELS OF IMAGES When a person is searching for an image in a collection, they may be thought of as searching for images that match a mental model of the image being sought. The mental model of the target may change during the course of the retrieval session, but this does not influence the fact that there is a dynamic mental model or how the model is constructed. If the collection is small enough, the searcher may browse the images looking for one that matches the mental model. When the collection becomes too large for efficient browsing, other search strategies must be employed. In the realm of image databases, the searcher may use an index. The appropriate nature of the index is governed by the nature of the mental representation. All current indexing techniques, both manual and autoHEIDORN/IMAGE RETRIEVAL 309 matic, linguistic and nonlinguistic, are attempts to make aspects of the mental representation explicit and match these aspects to the images in the collection. As depicted in Figure 2, aspects of the visual world are abstracts by the searcher and the indexer. The indexer must select aspects of the abstraction that are shared by the indexer and searcher and code them into the index so that the index itself is an abstraction of the visual world. Because of the nature of this matching process and the complexity of the visual mental models, neither concept-based nor contentbased indexing alone is sufficient to support an effective retrieval system. The best aspects of these approaches to indexing need to be identified and integrated.
منابع مشابه
A Novel Method for Content Base Image Retrieval Using Combination of Local and Global Features
Content-based image retrieval (CBIR) has been an active research topic in the last decade. In this paper we proposed an image retrieval method using global and local features. Firstly, for local features extraction, SURF algorithm produces a set of interest points for each image and a set of 64-dimensional descriptors for each interest points and then to use Bag of Visual Words model, a cluster...
متن کاملA Novel Method for Content Base Image Retrieval Using Combination of Local and Global Features
Content-based image retrieval (CBIR) has been an active research topic in the last decade. In this paper we proposed an image retrieval method using global and local features. Firstly, for local features extraction, SURF algorithm produces a set of interest points for each image and a set of 64-dimensional descriptors for each interest points and then to use Bag of Visual Words model, a cluster...
متن کاملبازیابی تعاملی تصاویر طبیعت با بهره گیری از یادگیری چند نمونه ای
Content-based image retrieval (CBIR) has received considerable research interest in the recent years. The basic problem in CBIR is the semantic gap between the high-level image semantics and the low-level image features. Region-based image retrieval and learning from user interaction through relevance feedback are two main approaches to solving this problem. Recently, the research in integra...
متن کاملVisual word proximity and linguistics for semantic video indexing and near-duplicate retrieval
Please cite this article in press as: Y.-G. Jia Vis. Image Understand. (2008), doi:10.101 Bag-of-visual-words (BoW) has recently become a popular representation to describe video and image content. Most existing approaches, nevertheless, neglect inter-word relatedness and measure similarity by bin-to-bin comparison of visual words in histograms. In this paper, we explore the linguistic and onto...
متن کاملContent Based Radiographic Images Indexing and Retrieval Using Pattern Orientation Histogram
Introduction: Content Based Image Retrieval (CBIR) is a method of image searching and retrieval in a database. In medical applications, CBIR is a tool used by physicians to compare the previous and current medical images associated with patients pathological conditions. As the volume of pictorial information stored in medical image databases is in progress, efficient image indexing and retri...
متن کاملImage Retrieval Using Dynamic Weighting of Compressed High Level Features Framework with LER Matrix
In this article, a fabulous method for database retrieval is proposed. The multi-resolution modified wavelet transform for each of image is computed and the standard deviation and average are utilized as the textural features. Then, the proposed modified bit-based color histogram and edge detectors were utilized to define the high level features. A feedback-based dynamic weighting of shap...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Library Trends
دوره 48 شماره
صفحات -
تاریخ انتشار 1999